Experiment: ol17_e1.
This document follows the same structure as the main manuscript, and presents figures to address the first two research questions:
Experiment- and individual-level figures for the third research question (Role of set size 1) are in a separate document.
There are a number of ways of summarizing individuals’ data when modeling it, or presenting its results. Depending on the particular statistic (model fits, predictions etc.) some approaches are more common or sensical than others. In the manuscript we presented the model fitting results in two ways. Whenever we talked about the model comparison, we referred to models fitted to individuals, with fits averaged across individuals. Whenever we talked about models’ prediction on the other hand, we looked at the predictions based on models’ fit to data aggregated across indivuduals in one experiments. We chose this approach to forego inferences being unduly influenced by the noise of predictions on the basis on unstable, fits to sparse individuals’ data. Nonetheless, this is not the only possible approach.
In this supplemental material, we show a variety of approaches for summarizing the data to complement the choices we made for the manuscript. For all three research questions, we will show experiment-level and individual-level plots corresponding to the information provided in the manuscript: Stability of model fits, model comparison by AIC (and crossvalidation for individual-level data), Predicted and observed error distributions, predicted/observed summary statistics and normalized root mean square deviation (NRMSD). In particular for the experiment-level data there are various approaches to aggregating or averaging the data, so we will explain these in more detail in the separate tabs below.
By experiment-level data, we mean the data in ol17_e1 as a collection of the data provided by all 19 participant(s). The manner in which data, model fitting results and predictions of an experiment are sumamrized can vary depending on goal and approaches. For some aspects of the data, some approaches make more sense than others. For completeness’ sake we show a variety here, even if some are somewhat non-sensical.
Here we treated all data in ol17_e1 as if it had been provided by a single participant (rather than provided by 19 participant(s)). We fitted all models to this aggregate data. The model comparison shows relative model fits on the basis of the fit to the aggregate data.
In the error distribution and summary statistics, the data is given by the observed data aggregated across individuals. The predictions (error distributions, resultant summary statistics and normalized RMSD relative to the aggregate data) are made on the basis of aggregate fit best-fit parameter estimates for all models.
In this tab, all graphs are shown for the purpose of illustrating the performance in the experiment on average, i.e., averaged across participants.
The model comparison plot is based on averaging models’ relative fits to the best-fit model. We show two rows of graphs. The top row includes all participants, and the bottom row excludes participants for extreme test/hold-out set deviance in LOSsO-CV for one of the models to allow comparison of model comparison results across approaches (this was done in the model comparison plots in the manuscript). Here in ol17_e1, we excluded 4 participant(s) (ol17_e1_15, ol17_e1_18, ol17_e1_7, ol17_e1_9); the number of participants in each panel is shown in top-right corner.
The predictions of the behavioral signature pattern (error distribution and resultant summary statistics) are based on averaging the parameter estimates of the best-fit individual fits. The observed data is given by the aggregate data for both graphs.
In the tab “Averaging individual parameter estimates”, we looked at the prediction of error distributions and resultant summary statistics on the basis of averaging participants’ best-fit parameter estimates. Here, the predictions are based on averaging individual’s predictions (which were based on best-fit parameter estimates). For the summary statistics (across set sizes, and normalized RMSD) we show two graphs: A) deriving the summary statistics from the averaged error distributions (for NRMSDs compared to the summary statistics of the aggregate data), and B) averaging individuals’ summary statistics, and similarly aggregating individuals’ normalized RMSDs.
Plots labeled “A”. The following summary statistics are derived from the averaged error distribution above. The graphs with the normalized NRMSD shows the comparison of these averaged prediction to the aggregate data. This is not particularly useful as the distribution underlying the aggregate data is not necessarily the averaged of the individuals’ data, but it is one approach to summarizing the data.
Plots labeled “B”. The following summary statistics are based on averaging participants’ summary statistics, for both the data (e.g., the summary statistics derived from individual observed error distributions) and the model predictions (e.g., the summary statistics derived from individual observed error distributions). The graphs with the normalized NRMSD shows the normalized RMSD across individuals as boxplots to provide an idea of the spread of the NRMSD in the experiment for these models (i.e., NOT the normalized RMSD derived from contrasting averaged predicted and observed summary statistics.)
Below are all graphs for all individuals in the experiment. Additionally, we included one graph showing the stability of the AIC model fits for all models by plotting the difference in deviance terms between the best and second-best run of each model for each individual’s data set.
By experiment-level data, we mean the data in ol17_e1 as a collection of the data provided by all 19 participant(s). The manner in which data, model fitting results and predictions of an experiment are summarized can vary depending on goal and approaches.
Here we treated all data in ol17_e1 as if it had been provided by a single participant (rather than provided by 19 participant(s)). We fitted all models to this aggregate data. The model comparison shows relative model fits on the basis of the fit to the aggregate data.
In the error distribution and summary statistics, the data is given by the observed data aggregated across individuals. The predictions (error distributions, resultant summary statistics and normalized RMSD relative to the aggregate data) are made on the basis of aggregate fit best-fit parameter estimates for all models.
In this tab, all graphs are shown for the purpose of illustrating the performance in the experiment on average, i.e., averaged across participants.
The model comparison plot is based on averaging models’ relative fits to the best-fit model. We show two rows of graphs. The top row includes all participants, and the bottom row excludes participants for extreme test/hold-out set deviance in LOSsO-CV for one of the models to allow comparison of model comparison results across approaches (this was done in the model comparison plots in the manuscript). Here in ol17_e1, we excluded 1 participant(s) (ol17_e1_15); the number of participants in each panel is shown in top-right corner.
The predictions of the behavioral signature pattern (error distribution and resultant summary statistics) are based on averaging the parameter estimates of the best-fit individual fits. The observed data is given by the aggregate data for both graphs.
In the tab “Averaging individual parameter estimates”, we looked at the prediction of error distributions and resultant summary statistics on the basis of averaging participants’ best-fit parameter estimates. Here, the predictions are based on averaging individual’s predictions (which were based on best-fit parameter estimates). For the summary statistics (across set sizes, and normalized RMSD) we show two graphs: A) deriving the summary statistics from the averaged error distributions (for NRMSDs compared to the summary statistics of the aggregate data), and B) averaging individuals’ summary statistics, and similarly aggregating individuals’ normalized RMSDs.
Plots labeled “A”. The following summary statistics are derived from the averaged error distribution above. The graphs with the normalized NRMSD shows the comparison of these averaged prediction to the aggregate data. This is not particularly useful as the distribution underlying the aggregate data is not necessarily the averaged of the individuals’ data, but it is one approach to summarizing the data.
Plots labeled “B”. The following summary statistics are based on averaging participants’ summary statistics, for both the data (e.g., the summary statistics derived from individual observed error distributions) and the model predictions (e.g., the summary statistics derived from individual observed error distributions). The graphs with the normalized NRMSD shows the normalized RMSD across individuals as boxplots to provide an idea of the spread of the NRMSD in the experiment for these models (i.e., NOT the normalized RMSD derived from contrasting averaged predicted and observed summary statistics.)
Below are all graphs for all individuals in the experiment. Additionally, we included one graph showing the stability of the AIC model fits for all models by plotting the difference in deviance terms between the best and second-best run of each model for each individual’s data set.